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Institute. 
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PATTERN  CLASSIFICATION  TECHNIQUES  APPLIED  TO 
HIGH  RESOLUTION  SYNTHETIC  APERTURE  RADAR  IMAGERY 


INTRODUCTION 

The  problem  of  automatic  pattern  classification  of  remotely  sensed  imagery  has  been 
the  subject  of  study  for  many  years.  It  is  the  purpose  of  this  research  note  to  show  the  results 
of  applying  10  different  pattern  classification  techniques  to  samples  of  synthetic  aperture 
radar  imagery.  In  the  past,  pattern  classification  methods  have  been  applied  to  various  types 
of  optical  imagery;  however,  little  work  has  been  done  in  applying  these  methods  to  high 
resolution  radar  imagery.  In  order  to  perform  automatic  classification  of  terrain  features 
using  radar  imagery,  the  application  of  pattern  classification  methods  is  a  necessary  step. 
These  methods  are  very  general  in  nature  and  can  be  applied  to  any  type  of  imagery  that  can 
be  represented  with  a  feature  vector.  The  following  sections  will  present  a  short  discussion  of 
the  classification  methods  used  and  the  final  results  obtained. 

METHODOLOGY 

The  pattern  classification  methods  that  were  used  in  this  study  were  all  standard 
methods  and  are  described  in  detail  in  various  textbooks  on  pattern  recognition.  For  this 
reason,  no  attempt  will  be  made  to  explain  each  method  in  detail  and  only  the  most 
significant  equations  will  be  presented  and  discussed.  The  imagery  used  consisted  of  samples 
of  high  resolution  synthetic  aperture  radar  imagery  taken  over  the  Huntsville,  Alabama, 
area  with  the  APD-10  radar  system.  Sections  of  radar  imagery  were  digitized  and  stored  on 
a  digital  disk  unit.  A  Lexidata  system  3400  image  processor  and  a  Hewlett  Packard  1000 
computer  were  used  to  display  the  images  on  a  cathode  ray  tube  and  to  take  100  samples  for 
each  of  four  terrain  classes  from  the  imagery.  Each  sample  consisted  of  a  32  by  32  pixel 
window  located  in  a  homogeneous  section  of  a  particular  terrain  class.  The  four  classes 
considered  were  (1)  cities  (combination  of  commercial  and  residental  structures,  DLMS 
category  #504  FIC  301  and  #505  FIC  401),  (2)  fields  (agriculture  used  primarily  for  crops 
and  pasture  land,  DLMS  category  #501  FIC  950),  (3)  water  (rivers  with  smooth  fresh  water. 
DLMS  category  #510  FIC  940)  and  fresh  water  subject  to  ice,  (lakes  and  reservoirs,  DLMS 
category  #510  FIC  943),  and  (4)  forests  (mixed  trees— deciduous  and  evergreens— DLMS 
category  #501  ETC  954).  A  feature  vector  consisting  of  13  components  for  eight  of  the 
classification  techniques  and  15  components  for  two  of  the  classification  techniques  was 
computed  for  each  image  sample.  These  components  of  the  feature  vector  consisted  of  the 
first-  and  second-order  histogram  statistics  calculated  from  the  32  by  32  pixel  window.  The 
equations  for  these  histogram  measures  are  provided  in  the  appendix.  A  discriminant 
analysis  technique  was  used  for  feature  selection  to  reduce  the  dimensionality  of  the  feature 
vector  from  13  to  2  in  such  a  way  that  the  resulting  components  were  optimized  for  showing 
class  separability.  This  feature  selection  technique  was  the  subject  of  a  previous  ETL  report1 
and  provides  a  linear  transformation  of  the  following  form: 


1  Richard  Hevenor.  Application  of  a  feature  S elei  non  Tet  hmque  /<>  SamfRes  of  Hnth  Resolution  S  \nthetu  Aperture  Raiiar 
fmager\.  V  S  Army  Fngmeer  topographic  I  ahoratonev  F  ort  BcKoir,  V  V  f  1 1  -OJ30.  ,lul\  I'Jx.V  \l>  \  I  V4'  006 


y.'r'J 


1 


(1) 


y  =  Ax 

where  x.  is  the  original  feature  vector  with  dimensionality  13  X  1,  A  is  the  transformation 
matrix  of  dimensionality  2X13,  andj;  is  the  transformed  feature  vector  with  dimensionality 
2X1.  The  solution  for  the  elements  of  matrix  A  is  given  in  the  previous  ETL  report.  For  the 
Bayes  Classifier,  15  components  of  the  feature  vector  were  used  to  compute  the  decision 
functions.  For  the  minimum  distance  classifier,  the  five  most  separable  feature  vector 
components  from  the  originally  established  15  components  (shown  in  appendix)  were 
selected  using  visual  inspection.  The  visual  inspection  was  used  for  a  set  of  more  than  50 
training  samples.  These  five  feature  vector  components  were  the  mean,  the  variance,  the 
skewness,  the  autocorrelation,  and  the  covariance.  After  feature  selection  was  performed, 
pattern  classifiers  were  implemented  and  applied  to  the  400  samples  of  radar  imagery.  The 
pattern  classification  techniques  implemented  were 

1.  Ho-Kashyap  Algorithm. 

2.  Increment  Correction  Algorithm. 

3.  Least  Mean  Square-Error  Algorithm. 

4.  Method  of  Potential  Functions. 

5.  Fisher  Linear  Discriminant. 

6.  Pseudoinverse  Technique. 

7.  Widrow-Hoff  Procedure. 

8.  Relaxation  Algorithm. 

9.  Bayes  with  normal  distributions. 

10.  Minimum  distance  classifier. 

A  short  discussion  of  each  of  these  techniques  follows. 


Ho-Kashyap  Algorithm 

The  Ho-Kashyap  algorithm  as  explained  in  the  text  by  Tou  and  Gonzalez^  is  a 
trainable,  non-parametric  classifier  that  attempts  to  solve  for  a  weight  vector  w,  such  that 
for  the  two  class  problem  we  have 

w^y  >  0  if  y  e  (oj  (2) 

T  A  A 

and  w 1  y  <0  if  y  e  a>2  (3) 


•’  J.  T  Tou  and  R  C.  Gon/alc/.  Pattern  Recognition  Principles.  Addison- Wesley.  1974. 


2 


where  wl  represents  class  1  and  a>2  represents  class  2  and  the  vector  y  is  equal  to  the  vector  y 
augmented  by  1. 


y  = 


y\ 

V2 

1 


(4) 


In  equation  (4)  y  |  and  y2  are  components  of^.  For  our  case  the  dimensionality  of  ^  and  w  is 
three.  The  T  associated  with  the  w  vectors  in  the  above  inequalities  means  transpose.  An 
iterative  solution  for  w  can  be  obtained  as  follows: 


w(k  +  1)  =  w(k)  +  cY#  [e(k)  +  |  e(k)  |  ] 
e  (k)  =  Yw  (k)  -b  (k) 
b(k  +  1)  =b(k)  +  c  [e(k)  + 1  e(k)|  ] 
where  w  (1)  =Y^  b  (1)  and  b (1)  >  0  but  otherwise  arbitrary 


(5) 

(6) 
(7) 


Y#  =  (YTY)  -1  YT 


Y  = 


T 

A 

y. 

T 

A 

y2 


L  -  N  J 


N  is  the  total  number  of  pattern  points  for  the  two  classes.  The  iterative  index  k  is  used  not 
only  on  w  but  also  on  the  vector_b,  so  that  bothb  and  w  are  updated  on  each  iterative  pass. 
The  y  data  belonging  to  class  2  are  multiplied  by  minus  one  before  insertion  into  the  matrix 
Y.  /^solution  for  w  can  be  obtained  when  0<c  <1  and  if  the  classes  are  linearly  separable  in 
the  first  place. 


Stochastic  Approximation  Methods 

Stochastic  approximation  is  a  general  approach  to  the  derivation  of  statistical  pattern 
classification  algorithms.  These  methods  use  the^training  set  data  to  obtain  an 
approximation  for  the  a  posteriori  probabilities  P(wJ  £)  where  Wj  represents  the  class. 
These  methods  are  nonparametric  and  allow  for  the  presence  of  noise  in  the  training 
samples.  There  were  three  stochastic  approximation  methods  used  in  this  work,  the 
increment  correction  algorithm,  the  least  mean  square  error  algorithm,  and  the  method  of 
potential  functions. 


Increment  Correction  Algorithm 


The  increment  correction  algorithm  is  discussed  by  Tou  and  Gonzalez3  and  assumes  a 
linear  approximation  to  the  a  posteriori  probability  as  follows: 


P(cui|  y)  ~  wTy  (8) 

where  ^  is  a  weight  vector  to  be  determined.  An  iterative  solution  for  w  can  be  obtained  by  using  the 
following  equation: 


w(k  +  1)  =  w(k)  +  ok  ^(k)  sgn  (  r  [^(k)]  -  wT^(k)  ( 
where  \v  ( 1 )  is  arbitrary 


(9) 


ak  =  1/k  and  k  =  1,2,  3,.... 


ifx(k)]  = 


1  if  y  (k)  e  aij 

A 

0  if  2  (k)  e  o»2 


sgn  {rfy  (k)]  -wTy(  k  1  = 


1  if  rfjyfk)]  >  wT^(k) 
-1  if  r[ y( k )]  <w  ^  y  (k) 


The  form  of  the  approximation  for  w  given  by  equation  (9)  comes  from  an  application  of  the 
Robbins-Monro  algorithm,  which  is  a  standard  method  for  finding  the  root  of  a  regression 
function.  Once  a  solution  for  w  is  found  to  a  sufficient  accuracy,  the  decision  rule  can  be 
implemented  as  follows: 


i  A  TA  A 

P(a>||  y)«w'y>  1/2  then  y  t  cuj 

i  A  *rA  ^ 

P(o>2  |  y)=Bw,y<  1/2  then  y_t  u>2 


Least  Mean  Square  Error  Algorithm 


The  least  mean  square  error  algorithm  as  presented  by  Tou  and  Gonzalez4  also 
approximates  the  a  posteriori  probability  as  equation  (8).  However,  the  solution  for  the  w 
vector  is  quite  different  as  shown  below. 

w(k  +  I)  =  w(k)  +  oj^fk)  |r[^(k)]  -wTy  (k)|  ( 10) 

A 

where  ak,  and  r[^(k)]  have  the  same  definitions  as  used  in  the  increment  correction 
algorithm.  Equation  ( 10)  provides  an  iterative  solution  for  w,  which  is  also  a  result  of  an 
application  of  the  Robbins-Monro  algorithm.  After  a  solution  for  w  is  obtained,  the 
decision  rule  is  implemented  in  the  same  manner  as  the  one  given  for  the  increment 
correction  algorithm. 


Method  of  Potential  Functions 


The  method  of  potential  functions  is  developed  by  Tou  and  Gonzalez5  and  is  based  on 
computing  an  approximation  for  the  a  posteriori  probability  that  makes  use  of  a  series 
expansion  as  shown  below: 

*  m 

P(o>i|  y)~Sc:  (k)  <fc  (*)  (11) 

j=lJ  J 

A 

In  this  expansion  the  functions  <t> j  (^)  are  a  given  set  of  orthonormal  f unctions,  and  the  c:(k) 
are  unknown  coefficients  that  must  be  determined.  In  our  case  the  d>;(y)  were  chosen  to  be  a 
set  of  Hermite  polynomial  functions.  The  decision  rule  is  simply  based  on  the  value  of  P(wj|^), 

if  P(a>j  |  £)  >  P(coj  |  £)  V  j  i  i  then^  e  co;  (12) 

The  coefficients  c:(k)  are  determinedAiteratively,  such  that  if  the  machine  makes  a  correct 
classification  for  the  sample  pattern  y  (k  +  1),  then 

Cj(k+l)  =  Cj(k)  (13) 

A 

When  the  machine  makes  a  misclassification  for  the  sample  pattern  y(k  +  1) 
and  ^(k  +  1)  t  oij,  then 

^  Cj(k+l)  =  Cj(k)+7k+i^j(i(k+l))  (14) 

and  if  j£(k  +  1)  j  ojj, 

Cj(k  +  1)  =  Cj(k)  -  yk+1d>j  (£(k  +  1))  (15) 

In  equations  (14)  and  (15)  yk+ j  plays  the  same  role  as  ak  in  the  increment  correction  rule. 


Fisher  Linear  Discriminant 


The  Fisher  Linear  Discriminant  technique  is  presented  by  Duda  and  Hart6  and  seeks  to 
reduce  the  original  x  vector  to  a  scalar  by  multiplying  x  by  an  appropriate  vector  \y.  The 
decision  rule  then  becomes 

if  wTx  >  w0  then  x  acou  (16) 

if  wTx  <  w0  then  x.  tcui  (17) 

where  the  constant  wo  is  determined  by  examination  of  the  training  set  data  and  w  is 
determined  from  the  following  equation: 

w  -  Sw''(m,  -  m2)  (18) 

where  Sw  is  the  within-class  scatter  matrix.  The  vectors  mi  and  m2  are  the  mean  vectors  for 
class  1  and  class  2,  respectively.  Sw  can  be  computed  as  follows: 

Sw  =  X  (x  -  mi)  ( x  -  rtii )T  +  X  (x  -  Qt;)  (x  -  mj>)T  ( 1 9) 

xe«j|  x_tU/2 


5  Ibid. 

*  Duda.  Richard  and  Hart.  Peter.  Pattern  Classification  and  Scene  Analysis,  Wiley  Interscience.  1973. 


Pseudoinverse  Technique 


This  technique  is  a  linear  classifier  developed  in  detail  in  Duda  and  Hart7,  and  attempts 
to  solve  for  a  vector  a,  such  that 

if  a*y  >0  then  y  e  <ot  (20) 

if  a^y  <  0  then  y  t  <02  (21) 

The  vector  a  for  our  case  has  three  components.  A  solution  for  a  can  be  obtained  by  forming 
a  matrix  H  from  all  the  training  samples  taken  from  the  two  classes.  Each  row  of  H  will 
consist  of  a  sample  ^  with  the  samples  coming  from  a>2  being  multiplied  by  -1 .  An  equation 
involving  H  and  a  is  given  as 

Ha  =  b  (22) 

The  vector  l>  has  as  many  components  as  there  are  training  samples  from  the  two  classes. 
Each  component  of  b  is  an  arbitrarily  specified  positive  constant.  Duda  and  Hart8  show  how 
the  following  solution  for  a  comes  from  (22). 

a  =  (HTH)-,HTb  (23) 

Once  the  components  of  a  have  been  calculated  for  each  possible  pair  of  classes,  then  the 
classifier  has  been  completed. 


Widrow-Hoff  Procedure 


The  Widrow-Hoff  procedure  as  explained  in  Duda  and  Hart9  uses  an  iterative 
technique  to  solve  for  the  vector  a  in  the  equation  Ha  =  b  given  above.  The  iterative  solution 
is  derived  in  Duda  and  Hart10  and  is  presented  as 

Sk+1  =ak  -  PkHT(Hak  -  b)  (24) 

where  a,  is  arbitrary 

Pk  =  Pi  /  k 

and  pi  is  any  positive  constant.  For  our  problem  we  let  p,  be  equal  to  1 .  The  Widrow  Hoff 
procedure  has  the  advantage  over  the  pseudoinverse  technique  of  being  able  to  obtain  a 
solution  even  when  the  matrix  H  1  H  is  singular. 


7  Ibid. 
^Ibid. 
^  Ibid. 
10|bid. 
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Relaxation  Algorithm 


The  relaxation  algorithm  is  developed  in  detail  in  Duda  and  Hart"  and  provides  a 
method  for  obtaining  a  solution  for  the  vector  a,  such  that 

if  b  thence  u>i  (25) 

if  a.1  ^  <  b  then  y  t  <02  (26) 

where  b  is  a  constant.  An  iterative  solution  for  a  is  given  as 

jij  is  arbitrary 


ak+  1=  »k  +  P 


b-aVlk 

"2k" 2 


■ik 


(27) 


where  js  tj,e  kUl  sample  point  from  the  training  set.  U^l)  is  the  Euclidean  norm  or 
magnitude  of  the  vector 


For  our  problem,  b  was  set  equal  to  1  and  p  was  set  equal  to  0.5. 


Bayes  with  Normal  Distribution 

If  the  probability  density  functions  of  the  unknown  patterns  can  be  assumed  to  be 
multivariate  normal  (Gaussian),  the  Bayes  classifier  becomes  practical  for  developing 
decision  functions.  The  Bayes  decision  function  for  normal  patterns  is  given  by  Tou  and 
Gonzales12  as 

dj(x)  =  In  p(<wj)-  'A  ln|  Cj  -  '/2  [(x  -  m()  TC f  (x  -  mj)], 
i  =  1,2,  ....  M,  and  M  is  the  total  number  of  classes  (28) 

where  the  mean  vector  m^E;  (il.and  the  covariance  matrix  CpEj  |(x-  mj)(x-mj)Tl.  Ej  1  1 
denotes  the  expectation  operator  over  the  patterns  of  class  wy  The  unknown  pattern  x  is 
assigned  to  class  usy  if  dj  (x)  >  d:  (x)  for  all  j  /-  i.  The  above  decision  function  is  derived  based 
upon  the  assumption  of  z.ero  loss  for  correct  classifications  and  equal  loss  for  misclassifi- 
cations. 

The  quantity  p(a>j)  represents  the  a  prion  probability  for  the  i1^  class.  For  our 
computations,  it  was  assumed  that  all  the  a  priori  probabilities  were  equal  to  1  M. 


Minimum  Distance  Classifier 


When  all  covariance  matrices  of  equation  (28)  become  equal  =  Qfor  i  =  1.2 . M 

and  also  C  =  l,  where _I  is  the  identity  matrix,  and  p( )  =  I  M,  for  i  =  1.2 .  M,  then 

equation  (28)  reduces  to 

dj(x)  :2k*Qlj  -  I  2  eo^  dj  •  i  =  I  -  2 . M  (29) 


1 1  Ibid. 

^  J  T.  Tou  and  R  C  Gonzalez.  Pattern  Recognition  Principles,  Adduon-Wealey,  1974 
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where  a  pattern  _x  is  assigned  to  class  a>j 
if  dj(x)  >  dj  (x)  for  all  j  ^  i. 

Equation  (29)  is  recognized  as  the  decision  functions  for  a  minimum  distance  pattern 
classifier  as  developed  in  Tou  and  Gonzalez13.  The  mean  vector  njj  =  Ej  {x},  as  well  as  the 
expectation  operator  Ej,  were  defined  previously. 

Results 


The  10  pattern  classification  techniques  discussed  above  were  applied  to  the  selected 
400  samples  of  synthetic  aperture  radar  images  taken  over  the  Huntsville,  Alabama  area. 
The  400  samples  were  used  as  a  training  set  to  derive  each  classifier.  The  400  samples  were 
then  submitted  to  the  classifiers  to  see  how  well  each  one  would  classify  the  original  training 
set.  This  section  will  present  the  results  of  this  work  for  each  classifier. 

When  using  the  first  eight  classification  techniques,  the  four  classes  were  considered  by 
taking  them  two  at  a  time.  The  final  decision  was  made  simply  on  the  basis  of  which  class 
had  received  the  most  votes  after  all  six  possibilities  had  been  decided.  When  computing  the 
second-order  histogram  statistics  to  be  used  as  components  of  the  original  x.  vector,  a 
spacing  between  pixels  in  x  and  y  had  to  be  chosen.  This  spacing  was  chosen  as  a  part  of  the 
feature  selection  process,  which  resulted  in  a  determination  of  the  A  matrix  for  each  pair  of 
classes.  Table  1  shows  the  values  of  the  x  and  y  spacings  used  to  compute  the  second-order 
histogram  statistics.  These  spacings  provided  an  optimum  separation  of  the  feature  vector 
data  for  each  pair  of  classes. 


Table  1.  Pixel  Spacing  for  Computing  Second- 
Order  Histogram  Statistics* 


Spacing  in 

X 

Spacing  in 

Y 

1.  FORESTS  and  FIELDS 

5 

0 

2.  FORESTS  and  CITIES 

-3 

4 

3.  CITIES  and  FIELDS 

2 

4 

4  FORESTS  and  WATER 

1 

0 

5.  FIELDS  and  WATER 

1 

0 

6.  CITIES  and  WATER 

1 

0 

In  figures  I  through  6  the  plots  are  shown  of  the_y  data  for  each  pair  of  classes  when  the 
increment  correction  algorithm  was  used.  The  line  drawn  on  each  figure  represents  the 
computed  decision  boundary.  Tables  2  through  9  present  the  final  results  for  the  first  eight 
classifiers.  For  each  classifier  the  percentage  of  correct  classification  is  presented  for  each 
class  along  with  an  overall  percentage  of  correct  classifications. 


I^lbid. 


Table  2  —  Final  Results  for  the  Ho-Kashyap  Algorithm 


Number  of  Correct 

Number  of  Incorrect 

Percentage  of  Correct 

Classifications 

Classifications 

Classifications 

1. 

FORESTS 

96 

4 

96% 

2. 

FIELDS 

90 

10 

90% 

3. 

CITIES 

97 

3 

97% 

4. 

WATER 

100 

0 

100% 

Overall  Percentage  of  Correct  Classifications 

=  95.75% 

Table  3  - 

—  Final  Results  for  the  Increment  Correction  Algorithm. 

Number  of  Correct 

Number  of  Incorrect 

Percentage  of  Correct 

Classifications 

Classifications 

Classifications 

1. 

FORESTS 

96 

4 

96% 

2. 

FIELDS 

90 

10 

90% 

3. 

CITIES 

98 

2 

98% 

4. 

WATER 

100 

0 

100% 

Overall  Percentage  of  Correct  Classifications 

=  96% 

Table  4  - 

Final  Results  for  the  Least  Mean  Square  Error  Algorithm. 

Number  of  Correct 

Number  of  Incorrect 

Percentage  of  Correct 

Classifications 

Classifications 

Classifications 

1. 

FORESTS 

96 

4 

96% 

2. 

FIELDS 

90 

10 

90% 

3. 

CITIES 

98 

2 

98% 

4. 

WATER 

100 

0 

100% 

Overall  Percentage  of  Correct  Classifications 

=  96% 

Table  5 

—  Final  Results  for  the  Method  of  Potential  Functions. 

Number  of  Correct 

Number  of  Incorrect 

Percentage  of  Correct 

Classifications 

Classifications 

Classifications 

1. 

FORESTS 

98 

2 

98% 

2 

FIELDS 

87 

13 

87% 

3. 

CITIES 

94 

6 

94% 

4. 

WATER 

97 

3 

97% 

Overall  Percentage  of  Correct  Classifications 

=  94% 

*■ «  ■*, 
A>.v 
/.v.v 


,  r 
V 
%  . 


15 


Table  6  —  Final  Results  for  the  Fisher  Linear  Discriminant  Algorithm 


Number  of  Correct 
Classifications 

Number  of  Incorrect 
Classifications 

Percentage  of  Correct 
Classifications 

1.  FORESTS 

91 

9 

91% 

2.  FIELDS 

91 

9 

91% 

3.  CITIES 

98 

2 

4.  WATER 

100 

0 

Overall  Percentage  of  Correct  Classifications 

=  95% 

Table  7  —  Final  Results  for  the  Pseudoinverse  Technique. 


Number  of  Correct 
Classifications 

Number  of  Incorrect 
Classifications 

Percentage  of  Correct 
Classifications 

1.  FORESTS 

96 

4 

96% 

2.  FIELDS 

89 

11 

89% 

3.  CITIES 

97 

3 

97% 

4.  WATER 

100 

0 

100% 

Overall  Percentage  of  Correct  Classifications 

=  95.5% 

Table  8  —  Final  Results  for  the  Widrow-Hoff  Procedure. 


Number  of  Correct 
Classifications 

Number  of  Incorrect 
Classifications 

Percentage  of  Correct 
Classifications 

1.  FORESTS 

96 

4 

96% 

2.  FIELDS 

89 

1 1 

89% 

3.  CITIES 

95 

5 

95% 

4.  WATER 

100 

Overall  Percentage  of  Correct  Classifications 

=  95% 

Table  9  —  Final  Results  for  the  Relaxation  Algorithm. 


Number  of  Correct 
Classifications 

Number  of  Incorrect 
Classifications 

Percentage  of  Correct 
Classifications 

1 

FORESTS 

94 

6 

94% 

FIELDS 

90 

10 

90% 

3. 

Cl  LIES 

98 

2 

98% 

4. 

WATER 

100 

0 

100% 

Overall  Percentage  of  Correct  Classifications  =  95.50 
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The  results  of  applying  the  Bayes  classifier  to  the  four  hundred  samples  of  radar  imagery  will 
now  be  considered.  The  classification  accuracy  was  evaluated  for  each  class  of  terrain 
feature  and  for  various  scanning  directions  (ID1R)*  and  inter-pixel  spacings  (IPS).  The 
1D1R  and  IPS  were  the  two  parameters  used  to  compute  the  joint-probability  matrices 
during  the  feature  measurement  stage,  which  precedes  the  classification.  The  overall 
classification  accuracy  was  then  calculated  for  each  combination  of  these  two  parameters 
and  illustrated  in  table  10.  It  is  noticed  that  the  best  overall  classification  accuracy  of  95.5 
percent  was  obtained  for  the  case  where  the  1 DI R  was  0  degrees  and  the  I  PS  was  2  pixels. 
The  same  accuracy  was  also  obtained  for  another  case  where  the  I  DIR  was  135  degrees  and 
the  IPS  was  3  pixels.  The  least  accurate  case  was  92.75  percent  which  occurred  when  the 
IDIR  was  90  degrees  and  IPS  was  1  pixel. 

Similar  to  the  previous  case,  an  overall  classification  accuracy  for  the  minimum  distance 
classifier  was  computed  using  the  400  samples  of  SAR  images.  As  expected,  the  overall 
classification  accuracy  for  all  cases  was  inferior  to  that  of  the  other  classifiers.  The  best 
overall  classification  accuracy  of  9 1 .25  percent  was  obtained  with  a  scanning  direction  of  0 
degrees  and  an  inter-pixel  spacing  of  1  pixel.  The  worst  accuracy  of  68.25  percent  resulted 
when  the  scanning  direction  was  135  degrees  and  the  inter-pixel  spacing  was  equal  to  3 
pixels.  Table  1 1  shows  the  overall  classification  accuracy  for  all  cases  considered. 

The  final  results  of  all  the  classification  techniques  are  shown  together  in  table  12. 


•IDIR  is  an  acronym  for  IN  I  l-'(iKR  DIREC I  ION  and  is  used  as  such  in  the  computer  program 


Table  12.  Final  Results  for  all  Classification  Techniques. 


Classification  Techniques 

Overall  Percentage  of 
Correct  Classifications 

Ho-K.ashyap  Algorithm 

95.75% 

Increment  Correction  Algorithm 

96% 

Least  Mean  Square  Error  Algorithm 

96% 

Method  of  Potential  Functions 

94% 

Fisher  Linear  Discriminant 

95% 

Pseudoinverse  Technique 

95.5% 

Widrow-Hoff  Procedure 

95% 

Relaxation  Algorithm 

95.5% 

Bayes  with  Normal  Distribution 

95.5% 

Minimum  Distance  Classifer 

91.25% 

As  can  be  seen  from  table  12,  the  results  from  all  the  classifiers  are  very  close  and  no  one 
technique  stands  out  from  all  the  rest.  The  method  of  potential  functions  was  found  to  be 
quite  complicated  and  computationally  extensive  even  though  the  end  result  was  similar  to 
other  techniques. 


Conclusions 


1 .  The  results  of  applying  the  10  pattern  classification  techniques  to  a  limited  set  of  radar 
image  samples  yielded  a  correct  classification  rate  between  9 1 .25  percent  and  96.00  percent 
for  the  training  samples  used. 

2.  Even  though  all  10  classification  techniques  yielded  similar  results,  they  were  not  all  of 
equal  computational  complexity. 

3.  I  he  method  of  potential  functions  was  found  to  be  difficult  to  implement  and 
computationally  intensive. 

4.  No  relationship  was  established  between  the  percentage  of  correct  classifications  and 
the  number  of  samples  used  for  training.  This  was  due  to  the  fact  that  the  number  of  training 
samples  used  for  al1  ohases  of  this  work  was  constant. 


Appendix  A.  Feature  Vector  Components 

The  following  first-and  second-order  histogram  measures  were  used  to  construct 
feature  vectors.  The  first  13  measures  were  used  to  form  a  feature  vector  for  the  first  eight 
classification  techniques.  The  last  two  classifiers  used  all  15  measures. 


L-l 

Mean  b=  1  bP(b)  =  x, 
b=o 

2  L-l 

Variance  o  =  —  (b-b)  P(b)-  x^ 

b  b=o 

1  L-l  _  , 

Skewness  b  -  -  —  (b-b)  P(b)-  x> 

°  b=o 
b 

1  L-,  _4 

Kurtosis  Oi,  -  4  -  (b-b)  P(b)-3-  Xj 

b=o 

b 

L-l 

Energy  bi^  -  L  fP(b)]  r  \s 
b~o 

L-l 

Entropy  bp  =  -  L  P(b>  logJWbi]  =  xh 

b=o 

L-l  L-l 

Autocorrelation  =  “  ~  abP(a,b)-x- 

a=o  b=o 

L-l  L-l 

Covariance  B^  =  L  1  (a-a)  (b-b)  P(a.b)  -  x* 

a=o  b=o 


L-l  L-l 

Inertia  B,  =  X  i.  (a-b)' P(a.b)  =  x„ 

a=o  b-o 

L-l  L-l 

Absolute  Value  Bv  -  L  -  I  a'bl  P(a.b)  -  x,,, 

a=o  b-o 


Inverse  Difference 


L-l  L-I  ___ — , 
Bd  =  1  £  1  +  (a-b)' 

a=o  b=o 


=  Xu 


L-l  L-l 


Energy 

Bn  =  2  1  [P(a,b)f  =  X|2 

a=o  b=o 

Entropy 

L-l  L-l 

BE  =  -  1  1  P(a,b)  log2  [P(a,b)]  =  xB 

a=o  b=o 

Mean 

L-l  L-l 

b=  1  1  bP(a,b)=Xi4 

a=o  b=o 

Variance 

L-l  L-l  , 

Vb=  S  2  (b-b)  P(a,b)  =  x,5 
a=o  b=o 

where  L  is  the  number  of  gray  levels  and  P(b)  and  P(a,b)  are  given  as 


P(b)  = 


Q(b) 

M 


M  is  the  total  number  of  pixels  in  the  sample  window.  In  this  case  M  was  equal  to  1024.  Q(b) 
is  the  number  of  pixels  of  gray  tone  b  that  occur  in  the  sample  window. 


P(a.b) 


N(a.b) 

M 


N(a,b)  is  the  number  of  times  gray  tone  a  is  located  next  to  gray  tone  b  by  the  displacement 
Ax  and  Ay. 


GLOSSARY 


This  report  contains  a  large  number  of  symbols  which  tend  to  be  easily  confused  unless 
a  strict  definition  is  held  for  each  one.  This  section  will  explain  the  symbols  used  most 
frequently  in  the  report. 


Symbol 

x 


y 

A 

y 

w 


N 

b 

Y 

c 

w, 


/ 

P(wj|  y) 


Explanation 

Original  feature  vector  consisting  of  thirteen  or  fifteen 
components  that  are  calculated  from  the  first  and  second  order 
histogram  statistics  of  the  image  samples. 

Feature  selection  transformation  matrix  used  to  reduce  the 
dimensionality  of  _x  from  thirteen  to  two  and  to  optimize  the 
separation  between  classes.  This  matrix  has  the  dimensionality 
of  two  by  thirteen. 

Transformed  feature  vector  consisting  of  two  components. 

Transformed  feature  vector  augmented  by  1  and  consisting  of 
three  components. 

Weight  vector  consisting  of  three  components  which  are 
determined  from  the  training  samples.  The  method  of  solving 
for  w  is  determined  by  the  particular  pattern  classification 
technique  used. 

Total  number  of  pattern  points  for  the  two  classes. 

Vector  consisting  of  N  arbitrary  positive  constants. 

Matrix  obtained  from  the  training  samples  taken  from  the  two 
classes.  Each  row  of  Y  consists  of  a  sample  of  jr ,  where  the 
samples  coming  from  class  two  are  multiplied  by  -1 . 

A  positive  number  between  zero  and  one. 

A  representation  for  pattern  class  one. 

A  representation  for  pattern  class  two. 

Belongs  to  or  is  in. 

Does  not  belong  to  or  is  not  in. 

A 

The  a  posteriori  probability  for  class  wj  given  that  the  vector  y 
has  been  calculated. 

A  member  of  a  sequence  of  positive  numbers  which  satisfy  the 
following  three  conditions: 


Symbol 


Explanation 


1.  lim  =  0 
k—  so 

OO 

2.  S  ak  =  oo 
k=l 


OO 

3.  S  a2  <  » 
k  =  1  k 


The  sequence  used  in  this  report  which  satisfies  the  above  conditions  is  the  harmonic 
sequence  ( 1  /  k)  =  ( 1 ,  1/2,  1/3 . ). 

A  A 

<t> >j  (y)  A  given  set  of  orthonormal  functions.  In  this  report  the  d>j(y) 

were  chosen  to  be  a  set  of  Hermite  polynomial  functions/ 

cj(k)  Unknown  coefficients  in  an  expansion  which  is  used  to 

approximate  the  a  posteriori  probability.  A  solution  for  these 
coefficients  is  obtained  by  the  method  of  potential  functions. 

V  For  all  values  of. 

Syy  The  within  class  scatter  matrix  calculated  from  the  original 

vectors  (x)  and  the  mean  vectors. 

mi  Mean  vector  for  class  one. 


m2  Mean  vector  for  class  two. 

a  Weight  vector  consisting  of  three  components.  A  solution  for  a 

depends  on  the  particular  pattern  classification  technique  used. 

H  Matrix  obtained  from  the  training  samples  taken  from  the  two 

classes.  Each  row  of  H  consists  of  a  sample  of_y  ,  where  the 
samples  coming  from  class  two  are  multiplied  by  -  I.  Identical 
with  the  definition  of  Y. 


i 

3  2 

ik 

ii 

*  yj 
j=i 

'  / 


Euclidean  norm  or  magnitude  of  the 
vector  £  k- 


dj(x) 

Decision  function. 

In 

Logrithm  to  the  base  e. 

p(wj) 

A  priori  probability  of  the  class  wj. 

£i 

Covariance  matrix  for  the  class  aij. 

24 


